I. Introduction

World leaders are influential people that have the power to impact millions of people’s lives. Some leaders are elected for terms while others serve for life, so their expected survival is of great interest. In this paper, we seek to compare Popes, US Presidents, Dalai Lamas, Chinese Emperors, and Japanese Emperors to see how their lifespans compare. Additionally, we want to analyze the impact on survival of a leader’s birth year, and whether or not this impact might vary depending on the type of leadership. After we build a model to analyze the effects, we also wish to look at some specific cases and determine the probability that the current 14th Dalai Lama will outlive Pope Francis and the probability that President Obama will outlive Emperor Naruhito. We will accomplish this by using survival analysis. We use Bayesian Inference to estimate the model parameters with the help of the JAGS program.

The rest of the paper is structured as follows. First, we will describe the data used to carry out our analysis. Next, we will describe the methods used in our analysis plan. Then, we will show how we carried out our analysis plan and recount our results. After that, we discuss our conclusions and areas for further research. Finally, our code and some additional information can be foudn in the Appendix.

II. Data

The data used in this analysis contains entries for 177 different world leaders. The types of world leaders present in the data include Popes, US Presidents, Dalai Lamas, Chinese Emperors, and Japanese Emperors. Each leader’s birth date type of leader is recorded. For some of these groups, we have data dating all the way back to the 14th century. Additionally, leaders who have passed away also have their death date and age of death recorded. For leaders that are still living, these columns instead contain the date the dataset was created (July 31, 2020) and their current age on that date. In order to further clarify who is dead or alive, there is a column titled “Censored,” which takes a value of 0 if the person is still dead and a value of 1 if that person is still alive. Related to this, there is another column called “Fail,” which takes on a value of 1 if the person is dead and a value of 0 if the person is alive. In total, there are 11 living leaders in our dataset, 4 of whose age of death we are trying to predict.

III. Methods

III a. Motivating the Model

We use survival analysis as a way to model \(T_i\), the lifespan of a given leader i depending on their year of birth and type of leadership. Survival analysis is useful in that it allows the consideration of “censored” data. This means that we do not always observe the outcome for each data point, for example, we do not know the death date of leaders that are still alive. Thus, we do not know their lifespan, we just know that their survival time \(T_i\) will be greater than their current age.

We model the lifespan \(T_i\) of an individual i after the Weibull distribution specified below. We choose to use the Weibull distribution because it is often utilized in survival analysis and allows the user to specify a flexible shape parameter for the distribution. The first parameter \(r\), also known as the shape parameter, is a positive scalar, and the second parameter \(\mu\), the scale parameter, is a linear function of the covariates (birth year and types of leadership as well as their interactions). \[ T_i \sim Weibull(\alpha, \mu) \] \[\log(\mu_i) = \beta_0+\beta_1 (Year \ of \ Birth_i)+\beta_2 (Leadership_i = UsPres) + \beta_3 (Leadership_i = ChinaEmp) \] \[+ \beta_4 (Leadership_i = DalaiLama) +\beta_5 (Leadership_i = JapanEmp) \\\] \[+ \beta_6 (Year \ of \ Birth_i * (Leadership_i = UsPres)) \\\] \[+ \beta_7 (Year \ of \ Birth_i * (Leadership_i = ChinaEmp)) \\\] \[+ \beta_8 (Year \ of \ Birth_i * (Leadership_i = DalaiLama)) \\\] \[+ \beta_9 (Year \ of \ Birth_i * (Leadership_i = JapanEmp))\] where Leadership_i = Popes is the baseline for comparison.

III b. Addressing Censored Data

To handle censored observations, we specify their contribution to the likelihood function using the “zeroes trick.” Since the sampling distribution of the censored observations is not known to be a standard distribution, we use the Poisson “zeroes trick” where the likelihood of a Poisson(phi) observation equal to zero is exp(-phi), and when observation[i] has a likelihood of L[i], then phi[i] is assigned to -log(L[i]) if our observed data is a set of 0’s. This is necessary because we don’t know the lifespan of currently living people, so we need a way to model their expected survival times. This method is also generally more computationally efficient (Stander et al, 2018).

III c. Prior Choice

We assume that our priors are independent because intuitively a given observation could not be two types of leaders. Additionally, we do not have sufficiently strong prior knowledge about the relationship between birth year and type of leadership to specify an informative prior. Therefore, we use uninformative priors for the betas in our model.

For our prior for r, we chose a prior of \(exp(1)\), as we believe the hazard of death increases with time.

#library(R2jags)
library(R2jags)
library(tidyverse)
library(lubridate)
library(knitr)
library(broom)
leaders_data <- get(load("Leaders.RData"))
leaders_data$Censored[leaders$Name == "Jianwen Emperor"] <- "0" #typo for Jianwen Emperor

leaders_data$Censored <- as.factor(leaders_data$Censored)
leaders_data$Type <- as.factor(leaders_data$Type)
leaders_data$Age.Event <- as.numeric(leaders_data$Age.Event)


summary(leaders_data)
##      Name             Birth.Date           Event.Date           Age.Event     
##  Length:177         Min.   :1324-05-14   Min.   :1368-03-29   Min.   : 9.259  
##  Class :character   1st Qu.:1507-09-16   1st Qu.:1572-07-05   1st Qu.:53.985  
##  Mode  :character   Median :1705-10-31   Median :1758-05-03   Median :67.529  
##                     Mean   :1674-01-23   Mean   :1737-03-27   Mean   :63.170  
##                     3rd Qu.:1833-08-20   3rd Qu.:1886-11-18   3rd Qu.:78.242  
##                     Max.   :1961-08-04   Max.   :2020-07-31   Max.   :95.830  
##  Censored        Type         Fail           centdate               time       
##  0:167    ChinaEmp :26   Min.   :0.0000   Min.   :1300-01-01   Min.   :0.2436  
##  1: 10    DalaiLama:14   1st Qu.:1.0000   1st Qu.:1300-01-01   1st Qu.:2.0770  
##           JapanEmp :30   Median :1.0000   Median :1300-01-01   Median :4.0582  
##           Pope     :63   Mean   :0.9379   Mean   :1300-01-01   Mean   :3.7406  
##           UsPres   :44   3rd Qu.:1.0000   3rd Qu.:1300-01-01   3rd Qu.:5.3362  
##                          Max.   :1.0000   Max.   :1300-01-01   Max.   :6.6157

Boxplot of age (Age.Event) by Type

ggplot(data = leaders_data) +
  geom_boxplot(aes(x = Type, y = Age.Event))

ggplot(data = leaders_data) +
  geom_point(aes(x = Birth.Date, y = Age.Event))

Our Model

\[ T_i \sim Weibull(\alpha, \mu) \\ log(\mu) = \beta_0 + \beta_1x_{i1}+\beta_2x_{i2}+...+\beta_px_{ip}\\ \] Log of mu is done so that the mu parameter is always positive (the support on mu in Weibull is that mu > 0)

More specificically,

\[ T_i \sim Weibull(r, \mu_i) \\ \log(\mu_i) = \beta_0+\beta_1 \text{(Year of Birth_i)}+\beta_2 \text{(Leadership_i = UsPres)} + \beta_3\text{(Leadership_i = ChinaEmp)} \\ + \beta_4\text{(Leadership_i = DalaiLama)} +\beta_5\text{(Leadership_i = JapanEmp)} \\ + \beta_6\text{(Year of Birth_i * (Leadership_i = UsPres))} \\ + \beta_7\text{(Year of Birth_i * (Leadership_i = ChinaEmp))} \\ + \beta_8\text{(Year of Birth_i * (Leadership_i = DalaiLama))} \\ + \beta_9\text{(Year of Birth_i * (Leadership_i = JapanEmp))} \]

Data Cleaning

# 10 censored data
leaders_data$Name[leaders_data$Censored == "1"]
##  [1] "Francis"                         "Benedict XVI"                   
##  [3] "Jimmy Carter"                    "Bill Clinton"                   
##  [5] "George W. Bush"                  "Barack Obama"                   
##  [7] "Donald Trump"                    "14th Dalai Lama (Tenzin Gyatso)"
##  [9] "Akihito (Emperor Heisei)"        "Naruhito (Emperor Reiwa)"
leaders_data$Birth.Year <- year(leaders_data$Birth.Date)

leaders_data <- leaders_data %>%
  mutate(TypeChinaEmp = as.factor(case_when(Type == "ChinaEmp" ~ 1,
                             TRUE ~ 0)),
         TypeDalaiLama = as.factor(case_when(Type == "DalaiLama" ~ 1,
                                   TRUE ~ 0)),
         TypeJapanEmp = as.factor(case_when(Type == "JapanEmp" ~ 1, 
                                  TRUE ~ 0)),
         TypePope = as.factor(case_when(Type == "Pope" ~ 1, 
                              TRUE ~ 0)),
         TypeUsPres = as.factor(case_when(Type == "UsPres" ~ 1,
                                TRUE ~ 0)))

leaders_nopred <- leaders_data %>%
  filter(!(Name %in% c("14th Dalai Lama (Tenzin Gyatso)", "Naruhito (Emperor Reiwa)", "Barack Obama", "Francis")))

survival_raw <- leaders_nopred$Age.Event
n <- length(survival_raw)

censored <- leaders_nopred$Censored
survival <- ifelse(censored == 0, survival_raw, NA)

censoring_limits <- ifelse(censored == 0, 100, survival_raw)
x_1 <- leaders_nopred$Birth.Year - mean(leaders_nopred$Birth.Year)
x_2 <- leaders_nopred$TypeUsPres
x_3 <- leaders_nopred$TypeChinaEmp
x_4 <- leaders_nopred$TypeDalaiLama
x_5 <- leaders_nopred$TypeJapanEmp
# add interaction b/w Birth.Year and Type
x_6 <- x_1*as.numeric(as.character(x_2))
x_7 <- x_1*as.numeric(as.character(x_3))
x_8 <- x_1*as.numeric(as.character(x_4))
x_9 <- x_1*as.numeric(as.character(x_5))
set.seed(123)
build_model <- function() {
  for(i in 1:n_censored) {
    z_censored[i] ~ dpois(phi_censored[i])
    phi_censored[i] <- mu_censored[i] * pow(t_censored[i], r)
    mu_censored[i] <- exp(beta_censored[i])
    beta_censored[i] <- beta_0 + beta_1*x_1_censored[i] + beta_2*x_2_censored[i] + beta_3*x_3_censored[i] + beta_4*x_4_censored[i] + beta_5*x_5_censored[i] + beta_6*x_6_censored[i] + beta_7*x_7_censored[i] + beta_8*x_8_censored[i] + beta_9*x_9_censored[i]
  }
  
  for(j in 1:n_non_censored) { #total rows - 6 ** 6 are censored in leaders_nopred
    survival_non_censored[j] ~ dweib(r, mu[j])
    mu[j] <- exp(beta[j])
    beta[j] <- beta_0 + beta_1*x_1_non_censored[j] + beta_2*x_2_non_censored[j] + beta_3*x_3_non_censored[j] + beta_4*x_4_non_censored[j] + beta_5*x_5_non_censored[j] + beta_6*x_6_non_censored[j] + beta_7*x_7_non_censored[j] + beta_8*x_8_non_censored[j] + beta_9*x_9_non_censored[j]
  }
  
  beta_0 ~ dnorm(0.0, 1.0E-3) # Prior on beta_0 is normal with low precision
  beta_1 ~ dnorm(0.0, 1.0E-3) # Prior on beta_1 is normal with low precision
  beta_2 ~ dnorm(0.0, 1.0E-3) # Prior on beta_2 is normal with low precision
  beta_3 ~ dnorm(0.0, 1.0E-3) # Prior on beta_3 is normal with low precision
  beta_4 ~ dnorm(0.0, 1.0E-3) # Prior on beta_4 is normal with low precision
  beta_5 ~ dnorm(0.0, 1.0E-3) # Prior on beta_5 is normal with low precision
  beta_6 ~ dnorm(0.0, 1.0E-3) # Prior on beta_6 is normal with low precision
  beta_7 ~ dnorm(0.0, 1.0E-3) # Prior on beta_7 is normal with low precision
  beta_8 ~ dnorm(0.0, 1.0E-3) # Prior on beta_8 is normal with low precision
  beta_9 ~ dnorm(0.0, 1.0E-3) # Prior on beta_9 is normal with low precision
  r ~ dexp(1) # Prior on r
  
  # sensitivity analysis priors
  # beta_0 ~ dnorm(0.5, 1.0E-2) # Prior on beta_0 is normal with low precision
  # beta_1 ~ dnorm(0.5, 1.0E-2) # Prior on beta_1 is normal with low precision
  # beta_2 ~ dnorm(0.5, 1.0E-2) # Prior on beta_2 is normal with low precision
  # beta_3 ~ dnorm(0.5, 1.0E-2) # Prior on beta_3 is normal with low precision
  # beta_4 ~ dnorm(0.5, 1.0E-2) # Prior on beta_4 is normal with low precision
  # beta_5 ~ dnorm(0.5, 1.0E-2) # Prior on beta_5 is normal with low precision
  # beta_6 ~ dnorm(0.5, 1.0E-2) # Prior on beta_6 is normal with low precision
  # beta_7 ~ dnorm(0.5, 1.0E-2) # Prior on beta_7 is normal with low precision
  # beta_8 ~ dnorm(0.5, 1.0E-2) # Prior on beta_8 is normal with low precision
  # beta_9 ~ dnorm(0.5, 1.0E-2) # Prior on beta_9 is normal with low precision
  # r ~ dexp(1) # Prior on r
  
  # Define alphas
  alpha_0 <- - beta_0 / r
  alpha_1 <- - beta_1 / r
  alpha_2 <- - beta_2 / r
  alpha_3 <- - beta_3 / r
  alpha_4 <- - beta_4 / r
  alpha_5 <- - beta_5 / r  
  alpha_6 <- - beta_6 / r
  alpha_7 <- - beta_7 / r
  alpha_8 <- - beta_8 / r
  alpha_9 <- - beta_9 / r
  
  # Percentage increases
  percentage_increase_year <- 100*(exp(alpha_1) - 1)
  percentage_increase_UsPres <- 100*(exp(alpha_2) - 1)
  percentage_increase_ChinaEmp <- 100*(exp(alpha_3) - 1)
  percentage_increase_DalaiLama <- 100*(exp(alpha_4) - 1)
  percentage_increase_JapanEmp <- 100*(exp(alpha_5) - 1)
  percentage_increase_BYUsPres <- 100*(exp(alpha_6) - 1)
  percentage_increase_BYChinaEmp <- 100*(exp(alpha_7) - 1)
  percentage_increase_BYDalaiLama <- 100*(exp(alpha_8) - 1)
  percentage_increase_BYJapanEmp <- 100*(exp(alpha_9) - 1)
  
  # Predictive distribution of age at the new values
  beta_Francis <- beta_0 + beta_1*year_Francis
  mu_Francis <- exp(beta_Francis)
  survival_Francis ~ dweib(r, mu_Francis) %_% T(present_length_Francis, upper_length)
  age_Francis_predictive <- survival_Francis

  beta_Obama <- beta_0 + (beta_1+beta_6)*year_Obama + beta_2
  mu_Obama <- exp(beta_Obama)
  survival_Obama ~ dweib(r, mu_Obama) %_% T(present_length_Obama, upper_length)
  age_Obama_predictive <- survival_Obama

  beta_Dalai <- beta_0 + (beta_1+beta_8)*year_Dalai + beta_4
  mu_Dalai <- exp(beta_Dalai)
  survival_Dalai ~ dweib(r, mu_Dalai) %_% T(present_length_Dalai, upper_length)
  age_Dalai_predictive <- survival_Dalai
  
  beta_Naruhito <- beta_0 + (beta_1+beta_9)*year_Naruhito + beta_5
  mu_Naruhito <- exp(beta_Naruhito)
  survival_Naruhito ~ dweib(r, mu_Naruhito) %_% T(present_length_Naruhito, upper_length)
  age_Naruhito_predictive <- survival_Naruhito
}
censored_dex = which(censored==1) # index of ppl who are censored

z_censored <- rep(0, length(censored_dex))
t_censored <- censoring_limits[censored_dex]
x_1_censored <- x_1[censored_dex] 
x_2_censored <- x_2[censored_dex] 
x_3_censored <- x_3[censored_dex]
x_4_censored <- x_4[censored_dex]
x_5_censored <- x_5[censored_dex]
x_6_censored <- x_6[censored_dex]
x_7_censored <- x_7[censored_dex]
x_8_censored <- x_8[censored_dex]
x_9_censored <- x_9[censored_dex]

n_censored <- length(censored_dex)

survival_non_censored <- survival[-censored_dex] # Remove ALL ALIVE PEOPLE
x_1_non_censored <- x_1[-censored_dex]
x_2_non_censored <- x_2[-censored_dex]
x_3_non_censored <- x_3[-censored_dex]
x_4_non_censored <- x_4[-censored_dex]
x_5_non_censored <- x_5[-censored_dex]
x_6_non_censored <- x_6[-censored_dex]
x_7_non_censored <- x_7[-censored_dex]
x_8_non_censored <- x_8[-censored_dex]
x_9_non_censored <- x_9[-censored_dex]

n_non_censored <- length(survival_non_censored)

birth_year_Francis <- leaders_data$Birth.Year[leaders_data$Name == "Francis"]
year_Francis <- birth_year_Francis - mean(leaders_nopred$Birth.Year)
present_length_Francis <- leaders_data$Age.Event[leaders_data$Name =="Francis"]

birth_year_Obama <- leaders_data$Birth.Year[leaders_data$Name == "Barack Obama"]
year_Obama <- birth_year_Obama - mean(leaders_nopred$Birth.Year)
present_length_Obama <- leaders_data$Age.Event[leaders_data$Name =="Barack Obama"]

birth_year_Dalai <- leaders_data$Birth.Year[leaders_data$Name == "14th Dalai Lama (Tenzin Gyatso)"]
year_Dalai <- birth_year_Dalai - mean(leaders_nopred$Birth.Year)
present_length_Dalai <- leaders_data$Age.Event[leaders_data$Name =="14th Dalai Lama (Tenzin Gyatso)"]

birth_year_Naruhito <- leaders_data$Birth.Year[leaders_data$Name == "Naruhito (Emperor Reiwa)"]
year_Naruhito <- birth_year_Naruhito - mean(leaders_nopred$Birth.Year)
present_length_Naruhito <- leaders_data$Age.Event[leaders_data$Name =="Naruhito (Emperor Reiwa)"]

upper_length <- 100


data_build_model <- list("n_censored",
                                   "n_non_censored",
                                   "z_censored",
                                   "t_censored",
                                   "x_1_censored",
                                   "x_2_censored",
                                   "x_3_censored",
                                   "x_4_censored",
                                   "x_5_censored",
                                   "x_6_censored",
                                   "x_7_censored",
                                   "x_8_censored",
                                   "x_9_censored",
                                   "survival_non_censored",
                                   "x_1_non_censored",
                                   "x_2_non_censored",
                                   "x_3_non_censored",
                                   "x_4_non_censored",
                                   "x_5_non_censored",
                                   "x_6_non_censored",
                                   "x_7_non_censored",
                                   "x_8_non_censored",
                                   "x_9_non_censored", 
                                   "year_Francis",
                                   "year_Obama", 
                                   "year_Dalai",
                                   "year_Naruhito",
                                   "present_length_Francis", 
                                   "present_length_Obama",
                                   "present_length_Dalai",
                                   "present_length_Naruhito",
                                   "upper_length")

model_output <- jags(data = data_build_model,  
                                      parameters.to.save = c("beta_0",
                                                             "beta_1",
                                                             "beta_2", 
                                                             "beta_3", 
                                                             "beta_4", 
                                                             "beta_5",
                                                             "beta_6",
                                                             "beta_7", 
                                                             "beta_8", 
                                                             "beta_9",
                                                             "r", 
                                                             "alpha_0",
                                                             "alpha_1",
                                                             "alpha_2",
                                                             "alpha_3",
                                                             "alpha_4",
                                                             "alpha_5",
                                                             "alpha_6",
                                                             "alpha_7",
                                                             "alpha_8",
                                                             "alpha_9",
                                                             "percentage_increase_year",
                                                             "percentage_increase_UsPres",
                                                             "percentage_increase_ChinaEmp",
                                                             "percentage_increase_DalaiLama",
                                                             "percentage_increase_JapanEmp", 
                                                             "percentage_increase_BYUsPres",
                                                            "percentage_increase_BYChinaEmp",
                                                          "percentage_increase_BYDalaiLama",
                                                            "percentage_increase_BYJapanEmp",
                                                             "age_Francis_predictive",
                                                             "age_Obama_predictive",
                                                             "age_Dalai_predictive",
                                                             "age_Naruhito_predictive"#,
                                                             #"survival_non_censored"
                                                             ), 

                                      n.iter = 50000, 
                                      n.chains = 3,
                                      model.file = build_model)
## module glm loaded
## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 173
##    Unobserved stochastic nodes: 15
##    Total graph size: 2430
## 
## Initializing model
model_output
## Inference for Bugs model at "/var/folders/hs/d7t9jp1x3t7cwq2xz5rf7f8h0000gn/T//Rtmpmyq4PV/model163e4e9db5db.txt", fit using jags,
##  3 chains, each with 50000 iterations (first 25000 discarded), n.thin = 25
##  n.sims = 3000 iterations saved
##                                  mu.vect sd.vect     2.5%      25%      50%
## age_Dalai_predictive              92.468   4.298   85.459   88.707   92.366
## age_Francis_predictive            92.682   4.674   84.286   88.823   93.058
## age_Naruhito_predictive           85.231  10.457   63.166   77.838   87.110
## age_Obama_predictive              84.391  11.078   61.557   76.479   86.250
## alpha_0                            5.389   0.217    4.959    5.249    5.388
## alpha_1                            0.000   0.000    0.000    0.000    0.000
## alpha_2                           -0.145   0.102   -0.345   -0.211   -0.144
## alpha_3                           -0.319   0.064   -0.444   -0.363   -0.318
## alpha_4                           -0.404   0.077   -0.553   -0.458   -0.405
## alpha_5                           -0.172   0.058   -0.285   -0.211   -0.172
## alpha_6                            0.000   0.001   -0.001    0.000    0.000
## alpha_7                            0.000   0.000   -0.001    0.000    0.000
## alpha_8                           -0.002   0.000   -0.003   -0.002   -0.002
## alpha_9                            0.000   0.000   -0.001    0.000    0.000
## beta_0                           -22.592   1.468  -25.522  -23.552  -22.573
## beta_1                            -0.002   0.001   -0.003   -0.002   -0.002
## beta_2                             0.606   0.423   -0.242    0.330    0.606
## beta_3                             1.336   0.259    0.832    1.159    1.342
## beta_4                             1.690   0.317    1.044    1.485    1.704
## beta_5                             0.721   0.237    0.251    0.569    0.721
## beta_6                            -0.002   0.002   -0.007   -0.004   -0.002
## beta_7                             0.000   0.001   -0.002    0.000    0.000
## beta_8                             0.007   0.002    0.003    0.006    0.007
## beta_9                             0.000   0.001   -0.002   -0.001    0.000
## percentage_increase_BYChinaEmp    -0.011   0.033   -0.076   -0.032   -0.011
## percentage_increase_BYDalaiLama   -0.162   0.042   -0.250   -0.189   -0.161
## percentage_increase_BYJapanEmp    -0.006   0.030   -0.065   -0.026   -0.006
## percentage_increase_BYUsPres       0.049   0.058   -0.064    0.009    0.047
## percentage_increase_ChinaEmp     -27.184   4.638  -35.869  -30.471  -27.262
## percentage_increase_DalaiLama    -33.024   5.190  -42.501  -36.740  -33.301
## percentage_increase_JapanEmp     -15.695   4.871  -24.815  -19.061  -15.813
## percentage_increase_UsPres       -13.044   8.918  -29.152  -19.032  -13.446
## percentage_increase_year           0.039   0.018    0.005    0.027    0.039
## r                                  4.195   0.256    3.693    4.030    4.182
## deviance                        1433.038   4.775 1425.893 1429.511 1432.229
##                                      75%    97.5%  Rhat n.eff
## age_Dalai_predictive              96.114   99.608 1.001  3000
## age_Francis_predictive            96.830   99.677 1.001  2600
## age_Naruhito_predictive           94.227   99.367 1.001  3000
## age_Obama_predictive              93.851   99.480 1.002  1500
## alpha_0                            5.528    5.832 1.003   900
## alpha_1                            0.001    0.001 1.001  3000
## alpha_2                           -0.078    0.060 1.001  3000
## alpha_3                           -0.275   -0.198 1.001  3000
## alpha_4                           -0.353   -0.250 1.003   970
## alpha_5                           -0.134   -0.059 1.003   850
## alpha_6                            0.001    0.002 1.001  3000
## alpha_7                            0.000    0.001 1.001  3000
## alpha_8                           -0.001   -0.001 1.002  2000
## alpha_9                            0.000    0.001 1.001  3000
## beta_0                           -21.640  -19.746 1.008   280
## beta_1                            -0.001    0.000 1.001  3000
## beta_2                             0.887    1.437 1.001  3000
## beta_3                             1.513    1.826 1.003  1100
## beta_4                             1.909    2.275 1.005   470
## beta_5                             0.879    1.177 1.004   540
## beta_6                             0.000    0.003 1.001  3000
## beta_7                             0.001    0.003 1.001  3000
## beta_8                             0.008    0.010 1.002  1300
## beta_9                             0.001    0.003 1.001  3000
## percentage_increase_BYChinaEmp     0.012    0.053 1.001  3000
## percentage_increase_BYDalaiLama   -0.133   -0.083 1.002  2000
## percentage_increase_BYJapanEmp     0.014    0.056 1.001  3000
## percentage_increase_BYUsPres       0.088    0.162 1.001  3000
## percentage_increase_ChinaEmp     -24.027  -17.986 1.001  3000
## percentage_increase_DalaiLama    -29.773  -22.153 1.003   920
## percentage_increase_JapanEmp     -12.545   -5.761 1.003   840
## percentage_increase_UsPres        -7.521    6.140 1.001  3000
## percentage_increase_year           0.051    0.076 1.001  3000
## r                                  4.356    4.723 1.004   550
## deviance                        1435.825 1444.160 1.001  2600
## 
## For each parameter, n.eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor (at convergence, Rhat=1).
## 
## DIC info (using the rule, pD = var(deviance)/2)
## pD = 11.4 and DIC = 1444.4
## DIC is an estimate of expected predictive error (lower deviance is better).
library(coda)
parameters = c("beta_0", "beta_1", "beta_2", "beta_3", "beta_4", "beta_5", 
               "beta_6", "beta_7",  "beta_8", "beta_9", "r")

res = data.frame(model_output$BUGSoutput$sims.matrix)[,parameters]
colnames(res) <- gsub("^.*\\.","", colnames(res) )

# lapply(seq(1,length(res)), function(i) {
#   traceplot(as.mcmc(res[,i]), smooth = FALSE, type = "l",
#             xlab = "Iterations", ylab = colnames(res)[i],
#             main = paste("Traceplot of ", colnames(res)[i]))
# })

tps <- function(var){
  ggplot(res, aes_(y=as.name(var), x=seq(1,nrow(res)))) +
    geom_line() +
    labs(title=paste("Traceplot of ", as.name(var)),
         x ="Iterations", y = as.name(var))
}
lapply(names(res), tps)
## [[1]]

## 
## [[2]]

## 
## [[3]]

## 
## [[4]]

## 
## [[5]]

## 
## [[6]]

## 
## [[7]]

## 
## [[8]]

## 
## [[9]]

## 
## [[10]]

## 
## [[11]]

library(ggplot2)
res_lag1 = lapply(seq(1:length(res)), function(i) {
  lres = lag(res[,i],1)
  plot(y=res[,i], x= lres, 
       xlab = paste0(colnames(res)[i], "-1"),
       ylab = paste0(colnames(res)[i]),
       main = paste("Lag-1 Scatter Plot of", colnames(res)[i]))
})

lapply(seq(1,length(res)), function(i) { 
  acf(res[,i], xlab = "Lag", ylab = "ACF", 
            main = paste("ACF Plot of ", colnames(res)[i]))
})

## [[1]]
## 
## Autocorrelations of series 'res[, i]', by lag
## 
##      0      1      2      3      4      5      6      7      8      9     10 
##  1.000 -0.003 -0.030 -0.003  0.047  0.004  0.004 -0.021 -0.018  0.013  0.011 
##     11     12     13     14     15     16     17     18     19     20     21 
## -0.004  0.013 -0.005  0.004  0.018  0.012 -0.019 -0.005  0.003  0.017 -0.017 
##     22     23     24     25     26     27     28     29     30     31     32 
## -0.015 -0.001 -0.027  0.017  0.007 -0.017 -0.046 -0.006 -0.009  0.028  0.015 
##     33     34 
## -0.020 -0.002 
## 
## [[2]]
## 
## Autocorrelations of series 'res[, i]', by lag
## 
##      0      1      2      3      4      5      6      7      8      9     10 
##  1.000 -0.013 -0.011 -0.030 -0.002  0.030 -0.002 -0.011 -0.039  0.008 -0.030 
##     11     12     13     14     15     16     17     18     19     20     21 
## -0.019 -0.004 -0.017 -0.038 -0.002  0.015  0.010 -0.018  0.025 -0.005 -0.003 
##     22     23     24     25     26     27     28     29     30     31     32 
## -0.016  0.012  0.026 -0.003 -0.029 -0.021  0.004  0.031 -0.026 -0.008 -0.009 
##     33     34 
## -0.027 -0.018 
## 
## [[3]]
## 
## Autocorrelations of series 'res[, i]', by lag
## 
##      0      1      2      3      4      5      6      7      8      9     10 
##  1.000  0.027  0.002 -0.020 -0.018 -0.006  0.014  0.008  0.010  0.016  0.008 
##     11     12     13     14     15     16     17     18     19     20     21 
## -0.022 -0.014  0.024  0.020  0.013 -0.020 -0.034 -0.022 -0.022  0.010 -0.018 
##     22     23     24     25     26     27     28     29     30     31     32 
## -0.010  0.019 -0.013 -0.048 -0.023 -0.009  0.023 -0.010 -0.027  0.004 -0.011 
##     33     34 
## -0.001 -0.010 
## 
## [[4]]
## 
## Autocorrelations of series 'res[, i]', by lag
## 
##      0      1      2      3      4      5      6      7      8      9     10 
##  1.000 -0.001  0.007 -0.013  0.015 -0.032  0.036 -0.012 -0.008 -0.026  0.001 
##     11     12     13     14     15     16     17     18     19     20     21 
## -0.013  0.014  0.025 -0.008 -0.028  0.000  0.018 -0.021  0.000  0.024 -0.032 
##     22     23     24     25     26     27     28     29     30     31     32 
## -0.014 -0.003  0.027  0.011  0.027  0.003  0.001  0.012 -0.016  0.014 -0.025 
##     33     34 
## -0.003  0.004 
## 
## [[5]]
## 
## Autocorrelations of series 'res[, i]', by lag
## 
##      0      1      2      3      4      5      6      7      8      9     10 
##  1.000  0.003 -0.017 -0.023 -0.006 -0.007 -0.049 -0.022 -0.017 -0.003 -0.001 
##     11     12     13     14     15     16     17     18     19     20     21 
## -0.018  0.002 -0.023  0.031  0.003  0.018  0.003  0.010 -0.026  0.036 -0.005 
##     22     23     24     25     26     27     28     29     30     31     32 
## -0.036  0.007 -0.013  0.020 -0.024 -0.017  0.007 -0.008  0.006  0.026  0.016 
##     33     34 
## -0.011 -0.013 
## 
## [[6]]
## 
## Autocorrelations of series 'res[, i]', by lag
## 
##      0      1      2      3      4      5      6      7      8      9     10 
##  1.000 -0.018  0.011 -0.033  0.010 -0.003  0.025  0.002 -0.025 -0.014 -0.012 
##     11     12     13     14     15     16     17     18     19     20     21 
##  0.002  0.004  0.001 -0.003  0.011 -0.001  0.019 -0.018  0.003  0.002 -0.004 
##     22     23     24     25     26     27     28     29     30     31     32 
## -0.013 -0.005 -0.005 -0.004 -0.026  0.012 -0.052 -0.025  0.011  0.049  0.016 
##     33     34 
##  0.003  0.013 
## 
## [[7]]
## 
## Autocorrelations of series 'res[, i]', by lag
## 
##      0      1      2      3      4      5      6      7      8      9     10 
##  1.000  0.014 -0.012 -0.013 -0.029  0.013  0.017 -0.013  0.021  0.009 -0.013 
##     11     12     13     14     15     16     17     18     19     20     21 
## -0.026 -0.005  0.021  0.010  0.030 -0.032 -0.011 -0.013 -0.011  0.001 -0.025 
##     22     23     24     25     26     27     28     29     30     31     32 
## -0.038  0.004  0.001 -0.030 -0.032 -0.038  0.028 -0.022 -0.025 -0.015 -0.005 
##     33     34 
##  0.011  0.014 
## 
## [[8]]
## 
## Autocorrelations of series 'res[, i]', by lag
## 
##      0      1      2      3      4      5      6      7      8      9     10 
##  1.000  0.000 -0.019 -0.017  0.022  0.003 -0.008 -0.019 -0.002 -0.012  0.000 
##     11     12     13     14     15     16     17     18     19     20     21 
##  0.000  0.033  0.008  0.027  0.009  0.000  0.023 -0.004  0.022  0.000  0.016 
##     22     23     24     25     26     27     28     29     30     31     32 
## -0.003 -0.010  0.032  0.017 -0.004  0.015 -0.039  0.035 -0.008  0.016 -0.010 
##     33     34 
## -0.032 -0.014 
## 
## [[9]]
## 
## Autocorrelations of series 'res[, i]', by lag
## 
##      0      1      2      3      4      5      6      7      8      9     10 
##  1.000  0.001 -0.015  0.036 -0.025 -0.002  0.006 -0.034 -0.018  0.002 -0.036 
##     11     12     13     14     15     16     17     18     19     20     21 
## -0.037 -0.001 -0.032  0.006  0.022  0.022  0.007 -0.023 -0.004 -0.023 -0.018 
##     22     23     24     25     26     27     28     29     30     31     32 
##  0.000  0.023 -0.006  0.012  0.019 -0.016  0.009 -0.011 -0.026  0.021 -0.032 
##     33     34 
## -0.034 -0.050 
## 
## [[10]]
## 
## Autocorrelations of series 'res[, i]', by lag
## 
##      0      1      2      3      4      5      6      7      8      9     10 
##  1.000 -0.010 -0.019 -0.026  0.024  0.016 -0.007 -0.011 -0.016  0.024 -0.033 
##     11     12     13     14     15     16     17     18     19     20     21 
## -0.040 -0.031 -0.014  0.010  0.004 -0.010  0.012  0.017 -0.006  0.022 -0.002 
##     22     23     24     25     26     27     28     29     30     31     32 
##  0.009  0.006  0.001 -0.023 -0.017  0.010 -0.010  0.039 -0.011 -0.014 -0.001 
##     33     34 
##  0.033 -0.015 
## 
## [[11]]
## 
## Autocorrelations of series 'res[, i]', by lag
## 
##      0      1      2      3      4      5      6      7      8      9     10 
##  1.000 -0.018 -0.035  0.018  0.019  0.013 -0.016 -0.001  0.000  0.008  0.016 
##     11     12     13     14     15     16     17     18     19     20     21 
## -0.004  0.000 -0.020  0.006  0.007  0.030 -0.030 -0.015  0.030  0.021 -0.007 
##     22     23     24     25     26     27     28     29     30     31     32 
##  0.013  0.003 -0.032  0.019  0.013 -0.011 -0.019 -0.033  0.022  0.003 -0.002 
##     33     34 
## -0.013 -0.022

III d. Model Diagnostics

The combination of traceplots, lag-1 scatterplots, and acf plots suggest the chain for each parameters converges, and that 50000 iterations is sufficient. The Rhat’s are all close to 1, which is another indicator of converge. All of the effective sample sizes are greater than 200, except for that of beta_0 (the intercept) and r.

IV. Analysis

As Stander et al (2018) pointed out, following this Weibull distribution, \(log(T_i)\) is equal in distribution to \(\frac{1}{r} (\beta_0 + \beta_1x_{i1}+\beta_2x_{i2}+...+\beta_9x_{i1}x_{i5}) + \frac{1}{r}\log(\epsilon)\) where \(\epsilon \sim exp(1)\). \(\alpha_j = -\beta_j/r\), \(j = 1,2,...,9\). The interpretation of coefficients depends on the interaction terms (i.e. both year of birth and the type of leadership).

For example, if a Pope is born one year later, he is expected to live longer by a multiplicative factor of \(exp(\alpha_1)\), or his lifespan is expected to increase by a percentage of \(100*exp(\alpha_1-1)\). If a U.S. President is born one year later, he is expected to live longer by a multiplicative factor of \(exp(\alpha_1 + \alpha_6)\); for a U.S. President and a Chinese Emperor were born in the same year \(y\), the Chinese emperor is expected to live longer by a multiplicative factor of \(exp(\alpha_3 - \alpha_2 + (\alpha_7-\alpha_6)*y)\) and so on.

people = c("age_Francis_predictive", "age_Obama_predictive", "age_Dalai_predictive",
"age_Naruhito_predictive")
four_pred = data.frame(model_output$BUGSoutput$sims.matrix)[,people]

print(paste("P(Dalai Lifespan > Francis Lifespan)=", mean(four_pred$age_Dalai_predictive>four_pred$age_Francis_predictive)))
## [1] "P(Dalai Lifespan > Francis Lifespan)= 0.480666666666667"
print(paste("P(Obama Lifespan > Naruhito Lifespan)=", mean(four_pred$age_Obama_predictive>four_pred$age_Naruhito_predictive)))
## [1] "P(Obama Lifespan > Naruhito Lifespan)= 0.483666666666667"

V. Results

insert model output summary here

Our model predicts that the 85-year-old 14th Dalai Lama is expected to live 92.4 years (95% credible interval: 85.4 - 99.6 years), the 84-year-old Pope Francis is expected to live 92.4 years (95% credible interval: 84.2 - 99.4 years), the 60-year-old Janpanese Emperior Naruhito is expected to live 85.3 years (95% credible interval: 66.4 - 99.5 years) and the 60-year-old former President Barack Obama is expected to live 84.8 years (95% credible interval: 61.5 - 99.4 years).

Posterior Predictive Plots (2 pairs)

Francis and Dalai Lama

francis_dalai = data.frame(plifespans = c(four_pred$age_Francis_predictive, four_pred$age_Dalai_predictive), person = c(rep("Francis", nrow(four_pred)), rep("Dalai", nrow(four_pred))))

mins = francis_dalai %>%
  group_by(person) %>%
  summarise(min_vals = round(min(plifespans), 2))
## `summarise()` ungrouping output (override with `.groups` argument)
meds = francis_dalai %>%
  group_by(person) %>%
  summarise(med_vals = round(median(plifespans), 2))
## `summarise()` ungrouping output (override with `.groups` argument)
val95 = francis_dalai %>%
  group_by(person) %>%
  summarise(vals_95 = round(quantile(plifespans, 0.95), 2))
## `summarise()` ungrouping output (override with `.groups` argument)
francis_txt = paste(paste("Current age:", mins$min_vals[mins$person=="Francis"], "yrs"),
      paste("Median:", meds$med_vals[meds$person=="Francis"],"yrs"),
      paste("95% quantile:", val95$vals_95[val95$person=="Francis"], "yrs"),
      sep = "\n")

dalai_txt = paste(paste("Current age:", mins$min_vals[mins$person=="Dalai"], "yrs"),
      paste("Median:",  meds$med_vals[meds$person=="Dalai"],"yrs"),
      paste("95% quantile:", val95$vals_95[val95$person=="Dalai"], "yrs"),
      sep = "\n")

plot_txt <- data.frame(
  label = c(francis_txt, dalai_txt),
  person = c("Francis", "Dalai")
)

ggplot(francis_dalai, aes(x=plifespans)) + 
  geom_histogram(aes(y=..density..), binwidth = 0.8, boundary = round(min(francis_dalai$plifespans), 2)) +
  facet_wrap(~person) +
  geom_vline(data=mins, aes(xintercept=min_vals), color = "black")+
  geom_vline(data=meds, aes(xintercept=med_vals), color = "blue")+
  geom_vline(data=val95, aes(xintercept=vals_95), color = "red")+
  geom_hline(yintercept = 0, colour = "black") +
  xlab("Lifespan (yrs)") + ylab("Predictive Probability density") + 
  ylim(c(0, 0.1)) + 
  geom_text(data = plot_txt, mapping = aes(x = 88.5, y = 0.09, label = label), size = 2)

Obama and Naruhito

obama_naru = data.frame(plifespans = c(four_pred$age_Obama_predictive, four_pred$age_Naruhito_predictive), person = c(rep("Obama", nrow(four_pred)), rep("Naruhito", nrow(four_pred))))

mins = obama_naru %>%
  group_by(person) %>%
  summarise(min_vals = round(min(plifespans), 2))
## `summarise()` ungrouping output (override with `.groups` argument)
meds = obama_naru %>%
  group_by(person) %>%
  summarise(med_vals = round(median(plifespans), 2))
## `summarise()` ungrouping output (override with `.groups` argument)
val95 = obama_naru %>%
  group_by(person) %>%
  summarise(vals_95 = round(quantile(plifespans, 0.95), 2))
## `summarise()` ungrouping output (override with `.groups` argument)
obama_txt = paste(paste("Current age:", mins$min_vals[mins$person=="Obama"], "yrs"),
      paste("Median:", meds$med_vals[meds$person=="Obama"],"yrs"),
      paste("95% quantile:", val95$vals_95[val95$person=="Obama"], "yrs"),
      sep = "\n")

naru_txt = paste(paste("Current age:", mins$min_vals[mins$person=="Naruhito"], "yrs"),
      paste("Median:",  meds$med_vals[meds$person=="Naruhito"],"yrs"),
      paste("95% quantile:", val95$vals_95[val95$person=="Naruhito"], "yrs"),
      sep = "\n")

plot_txt <- data.frame(
  label = c(obama_txt, naru_txt),
  person = c("Obama", "Naruhito")
)

ggplot(obama_naru, aes(x=plifespans)) + 
  geom_histogram(aes(y=..density..), binwidth = 0.8,boundary = round(min(obama_naru$plifespans), 2)) +
  facet_wrap(~person) +
  geom_vline(data=mins, aes(xintercept=min_vals), color = "black")+
  geom_vline(data=meds, aes(xintercept=med_vals), color = "blue")+
  geom_vline(data=val95, aes(xintercept=vals_95), color = "red")+
  geom_hline(yintercept = 0, colour = "black") +
  xlab("Lifespan (yrs)") + ylab("Predictive Probability density") +
  ylim(c(0, 0.06)) +
  geom_text(data = plot_txt, mapping = aes(x = 72, y = 0.05, label = label), size = 3)

The posterior predictive distribution of the lifespan of the 14th Dalai Lama is more uniform, with mode in the 90s. The posterior predictive distribution of the lifespan of Pope Francis is somewhat left skewed, with the mode in the early 90s. The posterior predictive distributions of the lifespans of President Obama and Emperor Naruhito are both left skewed, with the modes in the late 90s.

The probability that the 14th Dalai Lama will have a longer lifespan than Pope Francis is mean(four_pred$age_Dalai_predictive>four_pred$age_Francis_predictive)). The probability that President Obama will have a longer lifespan than Emperor Naruhito is 0.483.

# select betas and r from output and put into a tidy table
vars = c("beta_0", "beta_1", "beta_2", "beta_3", "beta_4", "beta_5", "beta_6", 
                    "beta_7", "beta_8", "beta_9", "r")
output <- data.frame(model_output$BUGSoutput$summary)
output <- output[row.names(output) %in% vars, ]
coef <- output %>%
  mutate("Variable" = head(vars,16),
         "Mean" = output$mean,
         "Standard Deviation" = output$sd,
         "2.5% Quantile" = output$X2.5.,
         "Median" = output$X50.,
         "97.5% Quantile" = output$X97.5.
         ) %>%
  select("Variable", "Mean", "Standard Deviation", "2.5% Quantile", "Median", "97.5% Quantile")
kable(coef)
Variable Mean Standard Deviation 2.5% Quantile Median 97.5% Quantile
beta_0 -22.5915879 1.4680033 -25.5221058 -22.5730566 -19.7465000
beta_1 -0.0016443 0.0007534 -0.0031519 -0.0016419 -0.0002007
beta_2 0.6059671 0.4230495 -0.2422546 0.6059599 1.4371459
beta_3 1.3355461 0.2587216 0.8319711 1.3420033 1.8255342
beta_4 1.6900917 0.3171348 1.0439544 1.7043228 2.2750221
beta_5 0.7206409 0.2366567 0.2509796 0.7212037 1.1769018
beta_6 -0.0020307 0.0024308 -0.0067542 -0.0019584 0.0026365
beta_7 0.0004541 0.0013653 -0.0022375 0.0004524 0.0031630
beta_8 0.0067854 0.0017555 0.0033960 0.0067710 0.0104489
beta_9 0.0002448 0.0012659 -0.0023251 0.0002513 0.0026518
r 4.1947291 0.2561452 3.6925186 4.1820203 4.7225935
# calculate and create graphs for standardized residuals

#select estimated mean betas
beta_0_est <- coef[1,2]
beta_1_est <- coef[2,2]
beta_2_est <- coef[3,2]
beta_3_est <- coef[4,2]
beta_4_est <- coef[5,2]
beta_5_est <- coef[6,2]
beta_6_est <- coef[7,2]
beta_7_est <- coef[8,2]
beta_8_est <- coef[9,2]
beta_9_est <- coef[10,2]

# select only non-living leaders to calculate residuals for
no_living <- leaders_data %>%
  filter(Censored == 0)

# mutate data to calculate mus
estimates <- no_living %>%
  mutate(Birth.Year.Cent = Birth.Year - mean(leaders_nopred$Birth.Year)) %>%
  mutate(ChinaEmp =case_when(Type == "ChinaEmp" ~ 1, TRUE ~ 0),
         DalaiLama =case_when(Type == "DalaiLama" ~ 1, TRUE ~ 0),
         JapanEmp =case_when(Type == "JapanEmp" ~ 1, TRUE ~ 0),
         Pope =case_when(Type == "Pope" ~ 1, TRUE ~ 0),
         UsPres =case_when(Type == "UsPres" ~ 1, TRUE ~ 0)) %>%
  mutate(BYUsPres = Birth.Year.Cent*UsPres) %>%
  mutate(BYChinaEmp = Birth.Year.Cent*ChinaEmp) %>%
  mutate(BYDalaiLama = Birth.Year.Cent*DalaiLama) %>%
  mutate(BYJapanEmp = Birth.Year.Cent*JapanEmp) %>%
  mutate(intercept = 1) %>%
  mutate(predicted = exp(beta_0_est + beta_1_est*(Birth.Year.Cent) + beta_2_est*(UsPres) + beta_3_est*(ChinaEmp) + beta_4_est*(DalaiLama) + beta_5_est*(JapanEmp) + beta_6_est*(UsPres)*(Birth.Year.Cent) + beta_7_est*(ChinaEmp)*(Birth.Year.Cent) + beta_8_est*(DalaiLama)*(Birth.Year.Cent) + beta_9_est*(JapanEmp)*(Birth.Year.Cent)))

data.mat = as.matrix(estimates[,c("intercept", "Birth.Year.Cent", "UsPres", "ChinaEmp", "DalaiLama", "JapanEmp","BYUsPres", "BYChinaEmp", "BYDalaiLama", "BYJapanEmp")], nrow = nrow(estimates), byrow = TRUE)

pred_res = data.frame(t(model_output$BUGSoutput$summary))
r = pred_res$r[1]
betas = pred_res[,c("beta_0", "beta_1", "beta_2",  "beta_3", "beta_4", "beta_5",
                     "beta_6", "beta_7", "beta_8","beta_9")][1,]
beta.mat = as.matrix(betas, nrow = p, byrow = TRUE)
logmus = data.mat %*% t(beta.mat)

# install.packages("TruncatedDistributions", repos="http://R-Forge.R-project.org") 
library(TruncatedDistributions)

# draw predicted values based on mus calculated above
set.seed(123)
Ts = rtweibull(166, r, (1/exp(logmus))^(1/r), 0, 100) 
## Warning in (1 - x) * Fa: longer object length is not a multiple of shorter
## object length
## Warning in x * Fb: longer object length is not a multiple of shorter object
## length
#calculate residuals
resid = estimates$Age.Event - Ts

#calculate standard deviations
meanTs = mean(Ts)
diff = Ts - meanTs
sd = sqrt((1/166)*(sum((diff)^2)) * (1+ (1/166) + (diff)^2/sum((diff)^2)))

# make plots
ggplot(data = estimates, aes(x = Ts, y = resid/sd)) + geom_point() + 
  labs(x = "Predicted Lifespan", y = "Standardized Residual")

ggplot(data = estimates, aes(x = Birth.Year, y = resid/sd)) + geom_point() +
  labs(x = "Birth Year", y = "Standardized Residual")

Posterior Predictive Checks

# non_censored_prediction <- filter(leaders_data, Censored == 0)
# K <- 10
# ysims <- matrix(nrow = 166, ncol = K)
# 
# for(k in 1:K){
#   for(i in 1:nrow(non_censored_prediction)){
#     w <- sample(nrow(res), 1)
#     samp <- res[w,]
#     val <- non_censored_prediction[i, ]
#     beta_temp <- samp$beta_0 + samp$beta_1*val$Birth.Year + samp$beta_2*as.integer(val$TypeUsPres) + samp$beta_3*as.integer(val$TypeChinaEmp) + samp$beta_4*as.integer(val$TypeDalaiLama) + samp$beta_5*as.integer(val$TypeJapanEmp) + samp$beta_6*val$Birth.Year*as.integer(val$TypeUsPres) + samp$beta_7*val$Birth.Year*as.integer(val$TypeChinaEmp) + samp$beta_8*val$Birth.Year*as.integer(val$TypeDalaiLama) + samp$beta_9*val$Birth.Year*as.integer(val$TypeJapanEmp)
#     mu <- exp(beta_temp)
#     t <- rtweibull(1, samp$r, (1/mu)^(1/samp$r), 50, 100)
#     ysims[i, k] <- t
#   }
#}
# r <- sample(nrow(non_censored_prediction), 1)
# hist(ysims[,1])
# abline(v=non_censored_prediction$Age.Event[r], col = "red")
# print(mean(ysims<0))
# print(mean(non_censored_prediction$Age.Event <0))

Lexis Diagram

#lg <- lexis_grid(1960, 2000, 30, 80, delta=5) 
#leaders_data$Initial.Age <- year(leaders_data$Event.Date) - leaders_data$Birth.Year
#subset <- filter(leaders_data, Censored == 0)
#subset <- subset[0:5,]
#ggplot() + geom_polygon(data = subset, mapping = aes(x = Birth.Year, y = Age.Event, group = Name)) + geom_point(data = subset, aes(x = Birth.Year, y = Birth.Year + Age.Event)) + geom_polygon(data = subset, mapping = aes(x = Birth.Year, y = ))
subset <- filter(leaders_data, Censored == 0)
ggplot(data = subset, aes(x = Birth.Year, Age.Event)) + geom_point(color="pink") + ggtitle("Distribution of Leaders' Ages") + labs(x = "Birth Year", y = "Age")

VI. Conclusion and Further Discussion

We sought to compare Popes, US Presidents, Dalai Lamas, Chinese Emperors, and Japanese Emperors to see how their lifespans compare. Our model has found that the impact of birth year does not change based on leadership type. Additionally, our model has found that lifespan does not depend on year of birth.

This model could be improved by including more predictors; for example, health widely varies among people, and could lead to a better model for prediction. For further implementations of this research question, it would be valuable to interpret economic data as it concerns the different countries that the leaders grew up in, as the conditions that a person live in lead to changes in a person’s lifespan.

VII. References

cite Stander et al

VIII. Appendix